Objective: To identify differentially expressed genes, you must first conduct quality control assessments of your experiments. Next, perform statistical diagnostic analyses to detect and correct any experimental biases.


The experiments were performed comparing one cell culture incubated at atmospheric oxygen conditions (call normoxic and labelled using Green dye) and another one incubated in 1% O2 (call hypoxic and labelled using Red dye).


Data loading

library(marray)

The R package Marray offers several functions to:

  • Read GPR files
  • Draw graphical representations of microarray results (foreground and background signals, missing values, MAplots, etc.)
  • Perform the normalization between Red and Green signals.

GPR file

Input : A GPR file with detailed information for each spot on the slide (gene name, Red and Green intensity values, background intensities and other statistics).

# Read the GPR file using the marray package function read.GenePix
rawdata <- read.GenePix(fnames="dataFile1_normAnalysis.gpr",
                        path= "/shared/projects/2528_ens_master2lf_fgat/data/microarrays")
## Reading ...  /shared/projects/2528_ens_master2lf_fgat/data/microarrays/dataFile1_normAnalysis.gpr

Note : This function reads a GPR file and creates objects of class marrayRaw. In these objects, you can find, for instance, vectors with intensity values (rawdata@maRf or rawdata@maGf). These vectors can be manipulated using classical R functions like “summary()”, “hist()”, etc.


Data exploration

# f is for foreground
# Intensity values in red/hypoxic channel
head(rawdata@maRf)
##      /shared/projects/2528_ens_master2lf_fgat/data/microarrays/dataFile1_normAnalysis.gpr
## [1,]                                                                                  992
## [2,]                                                                                 1907
## [3,]                                                                                  559
## [4,]                                                                                  645
## [5,]                                                                                32939
## [6,]                                                                                  681
# Intensity values in green/normoxic channel
head(rawdata@maGf)
##      /shared/projects/2528_ens_master2lf_fgat/data/microarrays/dataFile1_normAnalysis.gpr
## [1,]                                                                                 2561
## [2,]                                                                                 2585
## [3,]                                                                                 1588
## [4,]                                                                                 1604
## [5,]                                                                                43755
## [6,]                                                                                 1732

Visualisation of foreground signal over the microarray glass slide

# Red/hypoxic signals
image(rawdata,
      xvar = "maRf",
      main = "Hypoxic signal (with flags)")

## [1] FALSE
## NULL
# Green/normoxic signals
image(rawdata,
      xvar = "maGf",
      main = "Normoxic signal (with flags)")

## [1] FALSE
## NULL

Low quality spots filtering

Low quality spots will be eliminated from further analyses. Intensity values will be replaced with the R symbol “NA” (Not Available).

# You work on a copy of the rawdata
rawdataQC <- rawdata

#Background values to NA
rawdataQC@maRb[rawdataQC@maW < 0] = NA
rawdataQC@maGb[rawdataQC@maW < 0] = NA

#Signal values to NA
rawdataQC@maRf[rawdataQC@maW < 0] = NA 
rawdataQC@maGf[rawdataQC@maW < 0] = NA

MA plot visualisation

# You work on a copy of the rawdata
rawdataQCnoBg <- rawdataQC

#Replace all background by 0 as by default background is subtracted from foreground
rawdataQCnoBg@maGb[] = 0
rawdataQCnoBg@maRb[] = 0

plot(rawdataQCnoBg, legend.func = NULL, lines.func = NULL, ylim=c(-5,5), main = "MA plot before normalization")


Normalisation

Comparison of Red/Hypoxic and Green/Normoxic global signals

Draw the MA plot between Red/Hypoxic and Green/Normoxic signals.
  • Is there any experimental bias?
  • What information gives the boxplot representation?
  • What kind of normalization method needs to be applied?


plot(rawdataQCnoBg, main = "MA plot before normalization")

boxplot(rawdataQCnoBg, main = "Boxplot before normalization")

Comparison of different normalization procedures

Normalize the intensity measures between Red/Hypoxic and Green/Normoxic signals.
  • Try 3 different normalization methods “median”, “global loess” and “print tip loess.
  • Draw the associated MA plot and BoxPlot (after normalization). What are the differences with the graphs obtained before normalization?
  • Draw the log2(R/G) distribution before and after normalization. How do you interpret the results?


rawdataQCnoBgMedian <- maNorm(rawdataQCnoBg, norm = "median", echo = T)
## Normalization method: median.
## Normalizing array 1.
rawdataQCnoBgLoess <- maNorm(rawdataQCnoBg, norm = "loess", echo = T)
## Normalization method: loess.
## Normalizing array 1.
rawdataQCnoBgPrintTipLoess <- maNorm(rawdataQCnoBg, norm = "printTipLoess", echo = T)
## Normalization method: printTipLoess.
## Normalizing array 1.

MA plots after normalization

plot(rawdataQCnoBgMedian, legend.func = NULL, main = "norm = Median")

plot(rawdataQCnoBgLoess, legend.func = NULL, main = "norm = Global loess")

plot(rawdataQCnoBgPrintTipLoess, legend.func = NULL, main = "norm = Print tip loess")

Boxplots after normalization

boxplot(rawdataQCnoBgMedian, main = "norm = Median")

boxplot(rawdataQCnoBgLoess, main = "norm = Global loess")

boxplot(rawdataQCnoBgPrintTipLoess, main = "norm = Print tip loess")

plot(density(maM(rawdataQCnoBgLoess),na.rm = T),
     lwd = 2, col = 2, main = "Distribution of log(Ratio)")
lines(density(maM(rawdataQCnoBg), na.rm = T), lwd = 2)
abline(v = 0)
legend(x= 0.5, y= 1.2,c("Before normalization","After normalization with loess"), fill = c(1,2))


Reproductibility

#Date
format(Sys.time(), "%d %B, %Y, %H,%M")
## [1] "30 July, 2025, 13,54"
#Packages used
sessionInfo()
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-conda-linux-gnu
## Running under: Ubuntu 22.04.5 LTS
## 
## Matrix products: default
## BLAS/LAPACK: /shared/ifbstor1/software/miniconda/envs/r-4.4.1/lib/libopenblasp-r0.3.29.so;  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Europe/Paris
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] marray_1.82.0 limma_3.60.6 
## 
## loaded via a namespace (and not attached):
##  [1] digest_0.6.37     R6_2.6.1          fastmap_1.2.0     xfun_0.52        
##  [5] cachem_1.1.0      knitr_1.50        htmltools_0.5.8.1 rmarkdown_2.29   
##  [9] lifecycle_1.0.4   cli_3.6.5         sass_0.4.10       jquerylib_0.1.4  
## [13] statmod_1.5.0     compiler_4.4.1    rstudioapi_0.17.1 tools_4.4.1      
## [17] evaluate_1.0.3    bslib_0.9.0       yaml_2.3.10       rlang_1.1.6      
## [21] jsonlite_2.0.0